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Abstract — We examine and compare several different classes 
of "balanced" block codes over q-ary alphabets, namely symbol- 
balanced (SB) codes, charge-balanced (CB) codes, and polarity- 
balanced (PB) codes. Known results on the maximum size 
and asymptotic minimal redundancy of SB and CB codes are 
reviewed. We then determine the maximum size and asymptotic 
minimal redundancy of PB codes and of codes which are both 
CB and PB. We also propose efficient Knuth-like encoders and 
decoders for all these types of balanced codes. 

Index Terms — coding theory, balanced codes, modulation 
codes, asymptotic redundancy 

I. Introduction 

There are several different classes of block codes over a q-ary 
integer alphabet that can be described as being "balanced" in 
some sense. Consider, for example, the symmetric alphabets 
•Aq — {—q + l,—q + 3,—q + 5,...,q — 3,q — 1} that arise 
in the context of pulse amplitude modulation (PAM), e.g., 
A 4 = {-3,-1,4-1, +3}, A 5 = {-4, -2,0, +2, +4}. We 
say that a code is symbol-balanced (SB) over An if, in each 
codeword, all q alphabet symbols appear equally often. A 
charge-balanced (CB) code is one in which the sum of the 
symbols in each codeword is zero. We also define polarity- 
balanced (PB) codes, for which, in every codeword, the 
number of positive symbols equals the number of negative 
symbols. For q odd, this definition does not constrain the 
number of zero symbols. 

It is easy to see that for q — 2, i.e., for bipolar sequences 
of even length n, these three notions of being "balanced" 
are completely equivalent. For q — 3, i.e., for sequences 
over the alphabet {—2,0, +2}, the notions of CB and PB 
are equivalent, but the SB sequences form a proper subset of 
the set of CB and PB sequences. For example, the sequence 
(-2, -2, +2, 0, -2, +2, +2, +2, -2) of length 9 is CB and 
PB, but not SB. For q ^ 4, all three notions are mutually 
distinct. Any sequence which is SB is also CB and PB, but 
there do exist sequences which are PB but not CB (e.g., 
(— 3, — 1, +1, +1) over A4) and sequences which are CB but 
not PB (e.g., (+3, —1, —1, —1) over A^). Furthermore, there 
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Fig. 1. Relationships among the symbol-balanced (SB), charge-balanced 
(CB), polarity-balanced (PB), and charge & polarity-balanced (CPB) proper- 
ties. 



exist sequences which are both CB and PB (denoted as CPB) 
but not SB (e.g., (—3, —3, +3, +3) over A/±). In conclusion, 
the general relationship among the balancing criteria discussed 
above can be represented by the Venn diagram shown in Fig. Q] 

Balanced codes have found applications in digital commu- 
nications and data storage technology Q. They have been 
widely studied in the literature, particularly for the binary 
case, e.g., Q, 0, 0, 0, flU, (H. Some constructions 
also take into account error correction capabilities, e.g., 0, 
fl~3), 11201 . l22l . Results for non-binary alphabets have been 
presented for the SB and CB cases, albeit under different (or 
no specific) names, e.g., 0j] (SB) and 0, OH (CB). To 
the best of our knowledge, the PB concept for non-binary 
sequences is new and has not been studied before. It is of 
particular interest for applications which demand a balancing 
of positive and negative symbols, possibly in combination with 
a charge constraint. In this paper, we determine the number 
of q-axy PB sequences of length n as well as the number of 
ij-ary sequences of length n which are CPB, i.e., both CB 
and PB. From this, we derive expressions for the minimum 
redundancy of PB and CPB codes, which are compared to the 
corresponding expressions for SB and CB codes. 

A celebrated method to generate and decode bipolar bal- 
anced sequences of even length n was presented by Knuth |9 j. 
The key idea is to invert the first z symbols of the information 
sequence such that the resulting sequence is balanced. Knuth 
showed that it is always possible to find at least one such 
balancing index z. By communicating the value of z through 
a (balanced) prefix, decoding can be performed by inverting 
the first z symbols of the coded sequence. The redundancy 
of this elegant method is roughly log 2 (?i), which is about 
twice the minimum and can thus be considered as a price 
to be paid for simplicity. In this paper, we extend Knuth's 
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method, which assumes bipolar sequences, to larger alphabets. 
In particular, we present Knuth-like design methods for all 
balancing perspectives under consideration, i.e., for SB, CB, 
PB, and CPB. 

The rest of this paper is organized as follows. In Sec- 
tion|IIJ some definitions and preliminaries are presented. Then, 
in Section [TTIJ we first review known expressions for the 
maximum sizes of q-ary SB and CB codes of length n, as 
well as the minimal redundancy of these codes. We then 
derive the corresponding expressions for PB and CPB codes. 
In Section IIVI we describe Knuth-like constructions for a 
variety of codes with various combinations of SB, CB, and 
PB properties. Finally, the paper is concluded in Section [V] 

II. Preliminaries 
A. Alphabets and Balancing 
In Section HJ we introduced the alphabet 

A q = {-q + l,-q + 3,-q + 5, . . .,q -3,q -1}, 

where q 2. We now formally define when a sequence 
x = {x\, %2, ■ ■ ■ , X n ) £ (-Ay)" is balanced, for each of the 
considered perspectives. 

• A sequence x of length n = qm, with m ^ 1, is symbol- 
balanced (SB) if all q symbols in Aq appear equally often 
in x, i.e., 

\{i : Xi = j}\ = m 

for all j E Aq. 

• A sequence x of length n, with n being a positive integer 
which is even if q is even, is charge balanced (CB) if 
the sum of all symbols in x is equal to 0, i.e., 

n 

E ** = °- 

f=i 

• A sequence x of length n, with n being a positive integer 
which is even if q is even, is polarity balanced (PB) if 
the number of positive symbols in x equals the number 
of negative symbols, i.e., 

|{i:x i >0}| = |{i:x < <0}|. 

• A sequence x of length n, with n being a positive integer 
which is even if q is even, is charge and polarity balanced 
(CPB) if it is both CB and PB. 

Note that for lengths n which do not comply with the 
specifications, there exist no sequences satisfying the desired 
property. Hence, throughout this paper, we will assume that n 
is a multiple of q for SB codes and that, in case q is even, n 
is even for CB, PB, and CPB codes. 

When studying ^-ary balanced codes, other alphabets than 
Aq have also been considered in the literature, a prominent 
example being 

Z, = {0,1,...,?-1}. 
Also balanced codes over the roots of unity alphabet 
<D, ={e Z7Tih /l :h = 0,1,... ,q-l}, 



where i = \f—\, have received quite some attention, e.g., 0, 
[12|. The choice of the alphabet may influence the balancing 
notion. This is not the case for symbol balancing, which is 
clearly independent of symbol representation. The number of 
SB sequences of a certain length n will be the same for 
any ^-ary alphabet. The same conclusion is valid for polarity 
balancing, as long as we divide the alphabet symbols into two 
classes of equal size, with one neutral symbol in case q is 
odd. However, the notion of charge balancing is coupled to 
the choice of the alphabet. First of all, it demands that an 
additive operation is defined on the alphabet symbols, which, 
by the way, does not have to be closed with respect to the 
alphabet, i.e., a sum of alphabet symbols may take values 
outside the alphabet. The naming 'charge' and the choice to 
fix the sequence symbol sum L" =1 x i t0 zer °i as m the CB 
definition above, have been inspired by practical PAM-like 
applications. However, in other cases it may be desirable to 
fix the sum to another value. Also, the maximum number of 
CB sequences of a certain length may depend on the choice of 
the alphabet: for an irregularly spaced alphabet other results 
could be obtained than for a regularly spaced alphabet like 

Aq. 

Throughout this paper, we will assume that the code alpha- 
bet is Aq. Still, many derived results on maximum code sizes, 
minimum redundancies, etc., are also valid for other alphabets. 
Particularly, when the alphabet can be obtained by applying a 
bijective mapping of the format 

i^ai + b (1) 

on the symbols from Aq, where a ^ and b are real numbers, 
then all results obtained for Aq also hold for the other alphabet 
(and vice versa), even the CB results. Note that 7Lq is within 
this category (by choosing a = —1/2 and b = (q — l)/2). 
This implies that in 7Lq, the symbols smaller than (q — l)/2 
should be called 'positive' and the symbols larger than (q — 
l)/2 'negative'. Furthermore, the charge constraint should be 
replaced by L" =1 X\ = n{q — l)/2 in case the alphabet is Z^. 

B. Codes and Redundancy 

A code of length n is a set of sequences of length n. A code 
is said to be SB, CB, PB, or CPB if all codewords satisfy 
the respective properties. The sets of all SB, CB, PB, and 
CPB sequences of length n over Aq are denoted by Csb(w, q), 
Ccs{n,q), Cpb(m, q), and CQ-pQ(n,q), respectively, and their 
sizes by M s ^{n,q), M cb (?m), M PB {n,q), and M CPB (n,^), 
respectively. The redundancy r of a ^-ary code of length n 
and size M is 

r = n- log q M. (2) 

The minimum redundancies of SB, CB, PB, and CPB codes 
of length n over Aq are denoted by rc,^{n,q), rc^(n,q), 
r-p#(n,q), and rQw(n,q), respectively. 

C. Stirling Approximation 

In this paper, we will derive (asymptotic) expressions for the 
minimum redundancy. In the analysis we make frequent and 
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implicit use of Stirling's approximation for factorials, stated 
here for convenience. For n 1, it holds that 
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where ^ A n ^ ^ Hence, 
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and thus, for large values of n, we can use the approximation 
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D. Gaussian Approximation 

Another tool which we will frequently use is the following 
Gaussian approximation technique. We consider the symbols 
Xj in a sequence x as n independent random variables which 
are uniformly drawn from the alphabet Aq. We are interested 
in the distribution of the sum L" =1 <P( x i)> where cp is a 
function mapping symbols from Aq to real numbers, which 
has the property that the possible outcomes of the sum form a 
set of consecutive integer numbers. Then, by the Central Limit 
Theorem, the probability that this sum takes the integer value 
s is approximately 



aJin 



with mean 



M = n£[*(x)] = -£*(?-l-2;) 

1 j=0 



and variance 



»(E[(*W) 2 1- 




(6) 



Hence, the number of ^-ary sequences of length n with 
Ef=l ec l ua l t0 s i s approximately 



a v / 27r 



(7) 



Note that for fixed n and ^ this expression is maximum if s 
is equal to \i, which leads to a minimum redundancy of 

1 

when substituting (O for M in (f2]). 

III. Minimum Redundancy of Balanced Codes 

In this section, we consider the cardinalities of rj-ary SB, 
CB, PB, and CPB codes. From these cardinalities we derive 
asymptotic expressions for the minimum redundancies. The 
SB and CB results have been known for a long time but are 
reconsidered here for completeness. The PB and CPB results 



A. Symbol-Balanced Sequences 

For an SB code, all q alphabet symbols must appear equally 
often in any codeword of length n. Hence, the problem of de- 
termining the number of such words boils down to a standard 
combinatorial problem. This number and the consequence with 
respect to minimum redundancy, as already discussed in [11], 
are as follows. 

Theorem 1. For any q and n = mq, it holds that 



M SB («,<?) 



((»/?)!) 



2nn 



qi. 



Proof. The equality follows from straightforward combina- 
torics and the approximation from multiple uses of Stirling's 
formula ©. □ 

Corollary 2. For any q and n = mq, it holds that 

>"SbOm) = " - l og q M SB (n,q) 

«-l, q — 1 , _ q 

~ ^2-log,n + 5-2-log,27r-|. 

Proof. The equality follows (by definition) from (f2]i and the 
approximation from Theorem Q] □ 
By using (0 rather than (@), the more precise expressions 



M SB («,<?) = q n 



Inn 



1 + 



(5) and 



o 



are obtained. Hence, the approximation from Corollary |2] 
is exact if n — » oo. This also holds for the approximate 
minimum redundancy expressions which will be presented 
in the subsequent subsections. In Subsection IIII-E1 we will 
illustrate the accuracy of the approximate expressions for finite 
values of n. 

B. Charge-Balanced Sequences 

As observed by Capocelli et al. Q in their investigation of 
q-axy immutable codes, the number of words in a CB code of 
length n was studied by Star [15] in the context of his analysis 
of the number of restricted compositions of a positive integer. 
The final result is as stated in the next theorem, for which we 
provide a simple alternative proof. 

Theorem 3 . For any q and n (which is even ifq is even ), it holds 
that 

M CB (n,q) » ft 



nn(q 2 — 1) 

Proof. We use the Gaussian approximation technique as 
discussed in Subsection IH-DI Choosing the function (p to be 

x 
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(8) 
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it follows that the number of sequences x over Aq of length 
n with £f=i Xj = s is approximately equal to with mean 



ng,-l-2; =() 



(9) 
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(10) 



(from Q and (0) and variance 

/l?-7 9 _l_2^ 2> 

(from ©, ©, and ©). Note that CB sequences are charac- 
terized by the fact that s = 0, and thus substitution of this 
value in 0, with \j. = and <x 2 = n(q 2 — 1)/12, provides 
an approximation of Mc B (n,q). The result is as given in the 
theorem. □ 

Corollary 4. For any q and n (which is even if q is even), it 
holds that 

rcB(n,q) = « -log ? Men («,</) 

1 1 7T(^ 2 - 1) 

Proof. The equality follows (by definition) from and the 
approximation from Theorem [3] □ 



C. Polarity-Balanced Sequences 

When calculating the number of q-ary PB sequences of length 
n, we distinguish between the cases q is even and q is odd, 
since in the latter case we should take into account the fact 
that the code alphabet contains the symbol '0' which is of 
indeterminate polarity. The results are presented in the next 
theorems, while expressions for the minimum redundancies 
of PB codes are given in the subsequent corollaries. 

Theorem 5. For any even q and even n, it holds that 



M m {n,q) 



n 
n/2 



I)" 



(11) 



(12) 



Proof. The equality (fTTT i follows by observing that there 
are („%) wavs t0 create a balanced polarity pattern over 
n positions and that for each such pattern we have q/2 
symbol options for every positions. The approximation can 
be obtained by multiple uses of Stirling's formula (0 or by 
applying the Gaussian approximation technique discussed in 
Subsection lll-DI Here, we opt for the latter, since intermediate 
results also turn out to be useful for the CPB case. Choosing 
the function (p to be 

♦<*>-{;!: 

it follows that the number of ^-ary sequences x of length n 
with £" =1 4>(xj) = s is approximately equal to 01 with mean 



£j>(?-l-2/) = 



(14) 



(from (0 and ([T3]0 and variance 



2 

<t — n 



I(*(<7-l-2;)) : 
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(15) 



(from (fOl l. and (fl4l i). Note that PB sequences are char- 
acterized by the fact that s = 0, and thus substitution of this 
value in (0, with \i = and <r 2 = n/A, gives (fT2l) . □ 

Corollary 6. For any even q and even n, it holds that 
rpB (n,q) = n — log^ Mps(n,q) 

1 , 1 7T 

« 2 1 °g (/ "+2 1 % 2" 

Proof. The equality follows (by definition) from and the 
approximation from Theorem [5] □ 



Theorem 7. For any n and odd q, it holds that 



M FB {n,q) 



[n/2\ 



; f /!/!(»- 2/)! V 2 



n 2; 



27TO(^ - 1) ' 



(16) 
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Proof. The number of q-ary PB sequences of length n with / 
positive symbols, j negative symbols, and thus n — 2j neutral 

symbols, is y , ; , ( ^l 2/ )! - since there are j\j\(n-2j)\ ^ s 

to create the postive/negative/neutral pattern over « positions 
and for each such pattern we have (q — l)/2 symbol options 
for every non-neutral position. Summing over all possible 
values of j shows ( fT6l >. 

In order to obtain a simple expression for large values of n, 
we again use the Gaussian approximation technique introduced 
in Subsection IH-DI Proceeding as in the proof of Theorem 
while replacing the function (p by 



4>{x) 



-1, ifx<0, 
0, if x = 0, 
+1, if x > 0, 



giving mean 



9-1 

m = 7 E - 1 - 2 i) = 

'? ;=0 



(from (0 and dT8l l) and variance 

o 2 = »^(*(?-i-2i)) 2 ) = 

(from ©, COD and <fl9V), we obtain (fl7). 



n( 9 -l) 



(18) 
(19) 

(20) 
□ 



Corollary 8. For any n and odd q, it holds that 

rm{n,q) = n-log q M FB (n,q) 

1, 1, 2n(q-l) 

- 2 l0 ^" + 2 l0 ^^V"^- 

Proof. The equality follows (by definition) from and the 
approximation from Theorem [7] □ 
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D. Charge & Polarity-Balanced Sequences 

Since each of the alphabets A2 = { — and A3 = 
{—2,0, +2} has exactly one positive and one negative sym- 
bol, which have equal absolute value, it follows immediately 
from the definitions that the CB and PB constraints are 
completely equivalent for sequences over these alphabets. 
Therefore, for q ^ 3, any CB sequence is also PB, and vice 
versa. 

Hence, the minimum redundancy of a binary/bipolar CPB 
code of even length n satisfies 



*"cpb("/2) 



r CB("/2) = r PB (n,2) 

1 1 1 , ™ 

2 lo §2 n + 2 lo §2 2 ' 



where the final expression follows from Corollary |4] or |6] 
Furthermore, note that we have the same expression for 
rsB(w,2); see Corollary [2] This does not come as a surprise, 
as all balancing perspectives under consideration in the paper 
are equivalent in the binary/bipolar case. 

For the minimum redundancy of a ternary CPB code of 
length n we find 

rcPB(>,3) = r CB (n,3) = r PB (n,3) 
1 1 1 1 47r 

~ 2 lo §3 n + 2 l0 §3 -y 

where the final expression follows from Corollary |4] or |8] In 
this case, the corresponding expression for symbol balancing, 
provided by Corollary |2] is 



>"sbO,3) 



log 3 n + log 3 2/r - -, 



which exceeds fcpg(n, 3) roughly by a factor of two. 

As already argued in Section Q] the notions of CB and PB 
are not the same in case q 4. First, we precisely determine, 
by combinatorial arguments, the number of CPB sequences 
of length n in case q = 4. Then, we derive approximate 
expressions for the number of CPB sequences for q ^ 4, from 
which we obtain the minimum redundancy. 

We can count the number of CPB sequences over A4 of 
even length n as follows. Polarity balancing requires that n/2 
positions take values in {—3,-1}. If the number of such 
positions taking value —3 is i, then charge balancing requires 
that in the complementary set of n/2 positions taking values 
in {+1, +3} there must be i positions that take the value +3. 
Therefore, the size of the intersection of the sets of CB and 
PB sequences is given by 



Mcpb(i,4) 



n/2, 



/n £fn/2\ fn/2" 



K,i=0 

'"£{n/2\{ n/2 



(21) 



K n/2) V i J \(n/2) - i, 

ji/l) {n/2) = {n/2) ■ 

It seems to be cumbersome to extend the arguments used 
in the q = 4 case to determine Mcpb( m / 1 ?) f° r larger values 
of q. However, the elegant Gaussian approximation method is 



still feasible, albeit that we need a joint distribution this time, 
since we have two constraints. The results are presented in the 
next theorems and corollaries. 

Theorem 9. For any even q ^ 4 and even n, it holds that 



M C pb {n,q) — i 



nny q 2 — 4 ' 

Proof. We consider the symbols x, in a sequence x as n 
independent random variables which are uniformly drawn 
from the alphabet An with q ^ 4 even. We are interested 
in the joint distribution of the sums Si = LjLi x i/2 and 
S2 = L/Li where 4> is as defined in ( fT3l . The 

probability that these sums take the integer values Si and S2, 
respectively, is approximately 

1 =e -5oV (Sl ' S2) 



2no-\U2\J\ — p 2 



where 



/(si,s 2 ) = £ 



Si-m\ 2p(sj - Mi)(s2 - M2) 



i=l v °i 



0-10-2 



Mi = (from ©), 



= y n{q \ 2 - ( from <E3), 

ix 2 = (from (O), 



\l - (from CCS), 



and the correlation coefficient is 

E[(Si- W )(S 2 -m 2 )] _ E[StS 2 ] 



o-icr 2 



12 V 4 



3q 2 



48 



A(q 2 -1) 



Hence, the number of q-ary sequences of length n with Si 
Si and S 2 = S2 is approximately 

1 ^r/tsi^) 



(22) 



2 7TO"i 0"2 \J\ — p 2 

Substitution of S\ =0 (the charge constraint), s 2 = (the 
polarity constraint), and the two mean values, the two standard 
deviations, and the correlation coefficient, gives the stated 
result. □ 
Note that this theorem gives 

M C pb(",4)«4"— , 
nn 

a result which can also be obtained by applying the Stirling 
formula (|4]i multiple times on (|2U . 

Corollary 10. For any even q ^ 4 and even n, it holds that 
rcpn(n,q) = n - log^ M CPB (n,^) 



log n + log ny 



48 
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Proof. The equality follows (by definition) from (f2]i and the 
approximation from Theorem [9] □ 

Theorem 11. For any n and odd q ^ 5, it holds that 



M CPB (",<?) ~<f — 



Proof. We follow the same reasoning as in the proof of 
Theorem [9] though now using ( fT8l l instead of cfl~3T > for the (p 
function. Consequently, the standard deviation of S2 changes 
to 



02 



and the correlation coefficient to 

= £[(Si-mi)(S 2 -M2)] 



(from ©), 

E[SiS 2 ] 



12 V ? 



(<? 2 -i)(<?-i) 
12, 



4? 



The final result follows by substituting all the parameters in 
(EJ. □ 

Corollary 12. For any n and odd q ^ 5, it hoJds that 

rcM n 'l) = n -log^McPB (n,q) 
~ log," 



lo g« 



' (9 2_1)( 9 _1)( 9 _ 3 ) 



12^ 



Proof. The equality follows (by definition) from (f2]i and the 
approximation from Theorem Q~T] □ 

E. Discussion 

In this subsection, we discuss the results on the minimum 
redundancy of balanced codes as obtained in this section. 
As stated before, the minimum redundancy expressions as 
presented in the corollaries are approximations which are exact 
if n — > 00. For finite values of n, the accuracy of these 
expressions depends on the convergence rates of the under- 
lying Stirling/Gaussian approximations. Here, we provide an 
illustration by showing some numerical values for rQp#(n, 4), 
i.e., the minimum redundancy of a CPB code of length n over 
A4. From (O and (fJTJ we obtain the exact expression 



*"cpb(m,4) 



21og 4 



n 

K n/2)' 

while Corollary [10] gives the approximate expression 



?"CPb("/4) « log 4 («7r/2) 



(23) 



(24) 



The comparison of these two expressions as given in Table J] 
shows that the approximation is quite accurate, even for small 
values of n. 



TABLE I 

Numerical Values for r CFB (n,4) 



n 


Exact, Eq. (1231) 


Approximation, Eq. 1241 


10 


2.0227 


1.9867 


20 


2.5047 


2.4867 


40 


2.9957 


2.9867 


60 


3.2852 


3.2792 


80 


3.4912 


3.4867 


100 


3.6513 


3.6477 


200 


4.1495 


4.1477 


400 


4.6486 


4.6477 


600 


4.9408 


4.9402 


800 


5.1481 


5.1477 


1000 


5.3090 


5.3086 



TABLE II 

Asymptotic normalized redundancies 





SB 


CB 


PB 


CPB 




1 


1 


1 


1 




2 


2 


2 


2 


q = 3 


1 


1 
2 


1 

2 


1 
2 


q > 4 


9-1 
2 


1 
2 


1 

2 


1 



Note that all minimum redundancy expressions are of the 
form 

where g and /z are functions such that the output values may 
depend on the alphabet size q but not on the block length 
n. For comparison purposes, we introduce the asymptotic 
normalized redundancy (ANR) as the redundancy divided by 
log^ n in the limit of large values of n. Note that this ANR is 
equal to g(q). For example, it follows from Corollary |2] that 

gSBW) = -j— ■ 

The complete overview of these ANRs is provided in Table [TT] 
From this table, we conclude that the CB and PB properties are 
equally expensive in terms of ANR, while the SB property is 
q — 1 times as expensive. The combined CB and PB property 
(CPB) is as expensive as either of the individual properties, 
i.e., the other comes for free, if q $J 3, while it costs the sum 
of the individual contributions if q ^ 4. 

IV. Constructions of Balanced Codes 

In the previous section we have determined expressions for 
the number M(n, q) of q-ary sequences of length n satisfy- 
ing certain balancing constraints. From these expressions we 
calculated the minimum required code redundancy to achieve 
the constraints. However, the lists of balanced words come 
with little structure. Applying table look-up is only feasible 
for small codes, but for practical implementation of larger 
codes, we need simple encoding and decoding algorithms. 
Knuth presented such an algorithm for the case q = 2, 
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i.e., for binary /bipolar balanced codes ||9]. Here, we will 
propose extensions to non-binary codes from various balancing 
perspectives. 

All proposed methods take an approach similar to the 
original Knuth construction. We make simple and reversible 
modifications to a q-ary information sequence u of length k 
to obtain a q-ary balanced sequence x of the same length. 
Next, we create a q-ary balanced prefix p of length p, which 
uniquely identifies the modifications. The q-ary balanced code- 
word c = (p, x) of length n = p + k is then transmitted or 
stored. The receiver retrieves the modifications from the prefix 
and applies these in reverse on x to obtain the original u. 

The constructions are nice and simple, but not optimal 
with respect to redundancy. Note that all codewords consist 
of two parts which are both balanced, and thus words which 
are balanced overall, but not within these parts, are excluded. 
Hence, simplicity comes at a price of increased redundancy. In 
order to still keep the redundancy as small as possible within 
the construction framework, we should minimize the prefix 
length p. Since the prefix is much shorter than the information 
sequence, we will assume that encoding and decoding of the 
prefix can be done by table look-up or another minimum 
redundancy achieving method. Let the number of different 
prefixes required to uniquely identify the modifications be de- 
noted by P. Ignoring balancing, the number of q-ary symbols 
needed to represent the prefix is thus 

p' = log,P, (25) 

which we will call the unbalanced redundancy. The actual 
prefix length will be (a little bit) larger, since the prefix needs 
to be balanced as well. It should be chosen as the smallest 
integer p such that 

M{p,q)^P. (26) 

The analysis from the previous section shows that, for fixed 
q, the extra redundancy to make the prefix balanced is in the 
order of log p', i.e., 

p = p' + 0(logp'). 

Hence, for rough evaluation purposes, the unbalanced redun- 
dancy p', which is easily determined by d25l l. may serve as a 
satisfactory approximation of the actual redundancy p, which 
requires the more cumbersome computation from (|26V 

All constructions will be presented for the code alphabet 
Aq, but equivalents for other alphabets, e.g., Iiq, can be estab- 
lished using the mapping (Q~|i. Before starting the descriptions 
of the constructions, we introduce some more notation. The 
real sum of all symbols in a q-ary sequence y is denoted by 
Sum(y), i.e., 

Sum(y) 

i 

Further, let Sy(y) denote the number of appearances of the 
alphabet symbol j in y, i.e., 

S/(y) = |{» : Vi = /} 

for any alphabet symbol Finally, as a short-hand notation, 
we denote a run of b symbols a by a b , e.g., 3 2 1 3 ( — 1) 1 3 2 
denotes the sequence (3,3,1,1,1,-1,3,3). 



A. Knuth 's Construction 

We start by stating Knuth's original construction for bipolar 
codes @, as a reference. For any information sequence u 
of even length k and any j £ {0,1, ... ,k}, let Uy denote 
the sequence u with the first j symbols multiplied by —1. A 
balancing index is a number z for which u' z is balanced. 
Knuth Encoding Procedure 

1) Determine a balancing index z £ {0, 1, . . ,,k — 1} for 
the information sequence u. 

2) Multiply the first z symbols of u by —1 to obtain the 
balanced sequence x. 

3) Map z to a unique balanced prefix p. 

Then transmit or store the balanced codeword c = (p, x). 
Knuth Decoding Procedure 

1) Retrieve the balancing index z from p. 

2) Multiply the first z symbols of x by —1 to retrieve u. 
Proof. It is easy to see that the operation in the encoding 
procedure is properly reversed in the decoding procedure. 
Hence, we only need to show that for every sequence u of 
length k there exists at least one z £ {0, 1, . . .,k — 1} such 
that u' z is balanced, i.e., Sum(Uz) = 0. This immediately 
follows from combining the following observations. 

1) Sum(ug) is even. 

2) Sum(u') = Sum(u'-_ 1 )±2 for all / £ {1,2,..., k}. 

3) Sum(u|J.) = — Sum(uQ). 

□ 

Since there are k possible values for z, the redundancy, i.e., 
the length p of the prefix, is a little bit more than p' = log 2 k. 

Example 1. For the bipolar sequence 

u= (+1,-1,+1,+1,+1,+1) 

of length 6, encoding goes as follows. 

1) Find the balancing index to be z = 4. 

2) Invert the first 4 positions of u, i.e., 

x= (-1, +1,-1,-1, +1,+1). 

3) Uniquely map the balancing index 4 to one of the six 
balanced sequences of length four, e.g., 

p = (+1,-1,-1, +1). 

Then the balanced transmitted/stored sequence is 

c= (p,x) = (+1,-1,-1, +1,-1, +1,-1,-1, +1,+1). 

B. Polarity-Balanced Code Construction 

Knuth's original method for generating balanced binary se- 
quences can be adapted to generate q-ary PB sequences. This 
is rather straightforward, although there is a snag if q is odd. 
In this case, the number of zero-valued symbols in u may 
be of different parity than the length k, which results in an 
odd number of non-zero (either positive or negative) symbols. 
Since the value zero is (polarity-)neutral, i.e., neither positive 
nor negative, inversion of any number of symbols in u will 
not lead to a PB sequence in such a situation. We will solve 
this by introducing an offset in case q is odd. We propose 



the following algorithm for sequences over Aq, where ®2q 
denotes the addition over the integer numbers, with a reduction 
modulo 2q such that the final outcome is in Aq. 
PB Encoding Procedure 

1) If q is odd, then determine a symbol a in Aq such that 
Sfl(u) has the same parity as the length k of u, i.e., 
S fl (u) and k are either both even or both odd. 

2) If q is odd, then compute u' = u @2q (— a), where 
a = (a, a, . . .,a) is of length k. If q is even, then u' = u. 

3) Determine a polarity balancing index z G {0, 1, . . . ,k — 
1} for u'. 

4) Multiply the first z positions of u' by —1 to obtain the 
PB sequence x. 

5) Map z (if q is even) or (a,z) (if q is odd) to a unique 
PB prefix p. 

Then transmit or store the balanced codeword c = (p, x). 
PB Decoding Procedure 

1) Retrieve the balancing index z from p. 

2) Multiply the first z positions of x by —1 to retrieve u 
(if q is even) or u' (if q is odd). 

3) If q is odd, then retrieve a from the prefix p and compute 

U — u' @2q a. 

Proof. It is easy to see that the operations in the encoding 
procedure are properly reversed in the decoding procedure. 
Hence, we only need to show the existence of (i) a suitable 
offset a (in case q odd) and (ii) a suitable polarity balancing 
index z. 

(i) The existence of a can be demonstrated by supposing it 
does not exist and then deriving a contradiction. If q and k 
are odd, then S;(u) is odd for at least one symbol j G Aq, 
since all of them being even would imply that k = £i S, (u) is 
even. If q is odd and k is even, then Sj(u) is even for at least 
one j G Aq, since all of them being odd would imply that 
k = Y,i S;(u), a summation of an odd number of odd terms, 
is odd. 

(ii) The existence of z follows by a similar argument as 
for the Knuth algorithm. Let u', denote the sequence u' with 
the first j symbols multiplied by —1 and let (p be defined 
as in ( fT~8b . For a PB balancing index z, it must hold that 
Sum(<f>(uy)) = 0. The existence of a PB balancing index 
follows by combining the following observations. 

1) Sum((£(u )) is even, since the number of non-zero 
symbols in u' is even. 

2) SumO(up) = Sumf^u^)) + c for all ;' G 
{1,2,..., k}, where c G {-2,0, +2}. 

3) Sum(<K<)) = -Sum(*K,)). 

□ 

Since there are k possible values for z and q possible values 
for a, we have p' = log^ k if q is even and p' — 1 + log^ k 
if q is odd. 

Example 2. Let q = 5. For the sequence 

u = (+4, +4, -2, 0,0, 0,0) G (A 5 ) 7 , 

encoding goes as follows. 

1) Since q = 5 and k = 7 are odd, identify '—2' as the 
symbol a with an odd number of appearances in u. 



2) Subtract (modulo 10) the value -2 from every symbol in 
u, resulting in 

u' = (-4, -4,0, +2, +2, +2, +2). 

3) Find the PB index z to be 6. 

4) Multiply the first 6 positions of u' by —1 to obtain 

x= (+4, +4,0, -2, -2, -2, +2). 

5) Uniquely map (a,z) = (—2,6) to one of the PB 
sequences of length 4, e.g., 

p = (+2,0,0,-4). 

Then the balanced transmitted/stored sequence is 

c = (+2, 0, 0, -4, +4, +4, 0, -2, -2, -2, +2). 

C. Charge-Balanced Code Construction 

In fl6l . Swart and Weber presented a Knuth-like construction 
for q-ary CB codes over the alphabet TLq. We include it 
here, in a version for the alphabet Aq, to make this paper 
self-contained. Furthermore, we need it in the subsequent 
subsection as a component for CPB code construction. The key 
ingredient of the CB method is a set of qk balancing sequences 
bj, i = 0, 1, . . . , qk — 1, each consisting of g symbols j + 2 
followed by k — g symbols j, i.e., 

bi=(; + 2)*M 

where j = 2\i/k\ and g = i — k[i/k\ . Again, ®2q denotes the 
addition over the integer numbers, with a reduction modulo 2q 
such that the final outcome is in Aq. A charge balancing index 
is a number z such that Sum(u ®2q bz) = 0. The algorithm 
is described as follows. 
CB Encoding Procedure 

1) Determine a CB index z G {0,1, ... ,qk — 1} for the 
information sequence u. 

2) Compute the CB sequence x = u @2q bz- 

3) Map z to a unique CB prefix p. 

Then transmit or store the balanced codeword c = (p, x). 
CB Decoding Procedure 

1) Retrieve the balancing index z from p. 

2) Compute u = x®2q (— b z ). 

Proof. It is easy to see that the operation in the encoding 
procedure is properly reversed in the decoding procedure. 
Hence, we only need to show the existence of a CB index 
for any information sequence u of length k. Define b^ = bo, 
and consider the series 

Sum(u © 2(? b ), Sum(u ® 2 q bi ), . . . , Sum(u ® 2(? b qk ). 

We make the following observations. 

1) The series starts and ends with the same even value. 

2) For all i G {0,1, ... ,qk - 1}, it holds that 

Sum(u ®2q b; + i) = Sum(u ®2q b/) + c, 

where c is either +2 or —2q + 2. 
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3) It holds that 

q-\ k q-l 

£ Sum(u ® 2 q bjk) = L L ( U l ®2q2j) 

;'=0 /=1 ;'=0 

= fc£H? + l + 2;) = 0, 

;=0 

where the first equality follows from the fact that the 
sequence by& consists of k symbols 2j, and the second 
equality from the consequence that every position Z 
takes every symbol value from the alphabet Aq exactly 
once in the summation. Hence, the average value of all 
Sum(u @2n byfc), with / = 0, 1, . . .,q — 1, is 0. 
By combining these three observations, we can conclude that 
there exists at least one z in {0,1, ... ,qk — 1} such that 
Sum(u ® 2 q bz) = 0. □ 
Since there are qk possible values for z, the unbalanced 
redundancy is p' = 1 + log k. Note that by setting q = 2, 
we do not exactly get the original Knuth method as described 
in Subsection IIV-AI where p' is one bit less. The reason 
is that for the binary case, it can be shown (as done by 
Knuth and in Subsection IIV-AI) that there is always a suitable 
balancing index in a set of k candidates (rather than 2k). 
For further details, see |[T6l . Pelusi et al. lfT4l presented a 
slightly improved ij-ary CB coding scheme, using (q — l)k + q 
mod 2 rather than qk balancing functions, with the same 
asymptotic redundancy though. 

Example 3. We use the same information sequence as in 
Example [2 i.e., 

u= (+4, +4, -2, 0,0, 0,0) G {A 5 f. 

Encoding into a CB sequence goes as follows. 

1) Find a suitable CB index z to be 32. 

2) Compute the CB sequence 

x = u®i (b 32 ) 

= (+4, +4, -2, 0,0, 0,0) ©io 

(10,10,10,10,8,8,8) 
= (+4, +4, -2, 0,-2, -2, -2) 

3) Uniquely map the CB index 32 to one of the CB 
sequences of length 4, e.g., 

p= (+4,0,-2,-2). 

Then the balanced transmitted/stored sequence is 

c = (+4, 0, -2, -2, +4, +4, -2, 0, -2, -2, -2) . 

Note that the sequence x generated this way is not PB. Rather 
than z = 32, we could also have chosen z = 7, but also then 
the resulting CB sequence 

x= (-4, -4,0, +2, +2, +2, +2) 

is not PB. 



D. Charge & Polarity-Balanced Code Construction 

If q ^ 3, then any code which is PB is also CB and vice 
versa. Hence, either of the coding strategies described in the 
previous two subsections provides CPB codes. However, for 
q ^ 4, the CB and PB properties are no longer equivalent, and 
a dedicated construction method is needed. Such a method 
will be proposed in this subsection, where we will assume 
throughout that q ^ 4 and that k is even if q even. 

For constructing codes having both the charge and polarity 
balancing properties, we can still base our constructions on the 
methods described in the previous two subsections. However, 
the straightforward strategy of first applying one method and 
then the other could fail, since the property obtained in the 
first round might be destroyed in the second. Therefore, a 
more sophisticated strategy should be developed. 

In the proposed method, we first transform the information 
sequence u into a PB sequence as described in Subsec- 
tion IIV-BI In this PB sequence, which we denote by y, 
we focus on the subsequences y + , which consists of all 
positive symbols in y, and y~ , which consists of all negative 
symbols. Both subsequences have the same length (due to the 
established PB property) which we denote by k' . Note that 

Sum(y-) < < Sum(y + ). 

We are going to make modifications to y, affecting only y + 
and y~, such that the resulting sequence x satisfies 

Sum(x+) + Sum(x~) = 0, (27) 

which implies that x is CPB. 

The modifications are done in such a way that the polarity 
of all involved symbols will not change. Hence, like y, the 
sequence x is PB. The first step of the modification process 
consists of a possible 'mirror' operation on the symbols in y + 
(with respect to the value \q/2]). Define 

f 1, if Sum(y+) < k'\q/2] < -Sum(jT) 
£,= I or -Sum(y-) < k'\q/2'] < Sum(y+), 

0, otherwise. 

(28) 

If E, — 1, then all symbols y,- in y + are replaced by 2\q/2] — 
y,-; else they are left untouched. Note that for the sequence 
z obtained from y by this operation, it holds that Sum(z + ) 
and — Sum(z~) are both at least equal to k'\q/2] or both at 
most equal to this value. Define 

f +, if Sum(z+) ^ -Sum(z-) > k'\q/2] 
v=l or Sum(z+) ^ -Sum(z") < k'\q/2], 

[ — , otherwise. 

(29) 

In the second (and last) step of the modification process, 
we change either the positive or the negative symbols in z, 
in a manner similar to that used in Subsection IIV-CI Consider 
[q/2\k' balancing sequences 

b t = (j + 2)Zj k 's, 

i = 0, 1, ... , [q/2\k' - 1, where ; = 2[i/k'\ and g = i- 
k' [i/kf] . Throughout the rest of this subsection, let © denote 
the addition over the integer numbers, with a reduction modulo 
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2[q/2\ such that the final outcome is in A^ = {j G Aq : 
j > 0} if v = + and in A~ = {j G A q : /' < 0} if v = -. 
We replace z v by z v © b l(? , where w is chosen such that 

Sum(z v 8 h w ) = -Sum(z"), (30) 

where v denotes the inverse of v. In conclusion, the resulting 
sequence x satisfies (|27| >. 

In summary, we have the following algorithm in case q ^ 4. 
CPB Encoding Procedure 

1) Apply the encoding procedure from Subsection IIV-BI to 
change the information sequence u into a PB sequence 
y, using appropriate offset a (if q is odd) and PB index 
z. 

2) Compute £ by (l28l >. 

3) If £ = 1, then replace all symbols i/ ; in y + by 
2[^/2] — yi\ else leave them untouched. Call the re- 
sulting sequence z. 

4) Compute v by d29t , 

5) Determine an index IP such that ( l30b is satisfied. 

6) Replace in z the subsequence z y by z v © b TO , to obtain 
the CPB sequence x, . 

7) Map (z, £ , v, w) (if ^ even) or (a, z, £ , v, zv) (if q odd) 
to a unique CPB prefix p. 

Then transmit or store the balanced codeword c = (p, x). 
CPB Decoding Procedure 

1) Retrieve a (if q is odd), z, £, v, and w from the prefix 
P 

2) Replace x v by x v © (— b z() ) in x to obtain z. 

3) If £ = 1, then replace all symbols z; in z + by 
2\q/2~\ — z,; else leave them untouched. Call the re- 
sulting sequence y. 

4) Apply the decoding procedure from Subsection II V-B I to 
retrieve u from y, using a (if q is odd) and z. 

Proof. It is easy to see that the operations in the encoding 
procedure are properly reversed in the decoding procedure. 
Further, the validity of the PB part was already demonstrated 
in Subsection IIV-BI Hence, the only thing left to prove is that 
there always exists a suitable index zv. To this end, define 
b|^/2jfc' = bo an d consider the series 

Sum(z v © bo),Sum(z v © bi), . . .,Sum(z v ffib^/ 2 JJ:')- 

We make the following observations. 

1) The series starts and ends with the same even value. 

2) For all i e {0, 1,. . ., [q/2\k' - 1}, it holds that 

Sum(z v © bi+i) = Sum(z v © b/) + c, 

where c is either +2 or — 2[^/2j +2. 

3) It holds that 

L<?/2j-i L9/2J-1 
£ Sum^ffib,,,) = k> £ (<7-l-2/) 

;=0 j=0 

= [q/2\k'\q/2-]. 

Hence, the average value of all Sum(z v © b^/), with 
; = 7 1,...,L?/2J - 1, is fc'r<7/2]. 



By combining these three observations and (f29}, we can con- 
clude that there exists at least one w in {0, 1, . . . , \_q/2\k' — 
1} such that <[30j is satisfied. □ 
Since there are q possible values for a, k for z, 2 for £,, 2 

for v, and L 1 ?/ 2 ]^ ^ l_<?/ 2 J LV 2 J for ro > i,; is sufficient to 
choose the prefix length such that 

P = 4k[q/2\[k/2\ =qk 2 

CPB sequences can be accommodated if q is even, and 

P = Aqk[q/2\ [k/2\ = 2q{q - l)k[k/2\ 

if q is odd. Hence, the unbalanced redundancy is 

p , = log (; P = l+21og (? /c 

if q is even, and very close to that number if q is odd. 

Example 4. We use the same information sequence as in 
Examples [2] and [3] i.e., 

u = (+4, +4, -2, 0, 0, 0, 0) G (A 5 ) 7 . 

Encoding into a CPB sequence goes as follows. 

1 ) From Example [2] the PB sequence 

y = (+4, +4, 0,-2, -2, -2, +2) 

is obtained. 

2) Find £ = 1, since 

-Sum(y-) = 6 < 9 < 10 = Sum(y+). 

3) Mirror the positive values in y with respect to +3 to 
obtain 

z = (+2, +2, 0,-2, -2, -2, +4). 

4) Find v = — , since 

-Sum(z~) = 6 < 8 = Sum(z + ) ^ 9. 

5) Determine w = 1 as a suitable balancing index. 

6) Add (modulo 4, with the resulting symbols in the set 
{—4,-2}) the sequence bi = (2,0,0) to z~, i.e., 
compute 

x = (+2, +2, 0,-2, -2, -2, +4) 
©(0,0,0,2, 0,0,0) 
= (+2, +2, 0,-4, -2, -2, +4) 

7) Uniquely map (a,z,£,,v,w) = (— 2, 6, 1, — ,1) to one 
of the CPB sequences of length 6, e.g., 

p= (+4, +2, -2, -4, +4, -4). 

Then the CPB transmitted/stored sequence is 

c = (+4, +2, -2, -4, +4, -4, +2, +2, 0, -4, -2, -2, +4). 
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E. Symbol-Balanced Code Construction 

At first sight, the Knuth approach may seem to be less 
suitable for generating q-aiy SB sequences than for CB and 
PB sequences. Still, Mascella and Tallini presented Knuth-like 
SB construction methods which are based on maps exchanging 
alphabet symbols IfTUl , IfTTII . By applying q — 1 such maps, 
each guaranteeing that a particular symbol appears the desired 
number of times, symbol balancing is achieved. Here, we 
present another Knuth-like SB method which is similar to this 
Mascella-Tallini approach in the sense that it also operates in 
q — 1 rounds, but is different in the sense that it adds in each 
round an appropriate balancing sequence to the data sequence, 
rather than performing specific symbol exchanges. Hence, our 
method is more in the spirit of the constructions presented in 
the previous subsections. 

In order to encode a data sequence u of length k = qm 
into an SB sequence x, we propose the following Knuth-like 

algorithm. It consists of q — 1 rounds, numbered 1, 2 cj — 

1, in which we will perform simple reversible manipulations 
on the data sequence, such that the end result is SB. In round 
v, we force there to be exactly m = k/q symbols —q + l+2v 
in the sequence, a condition that will not change anymore in 
the next rounds. For v = 1, 2, . . . , q — 1, let 

A° q = {-q -l + 2v,-q+l + 2v,...,q -1}, 

i.e., A l q is the sub-alphabet consisting of the q + 1 — v largest 
elements of the alphabet Aq, 

M v {y) = max{; G A° q : S y (y) ^ S,-(y) Vt G A°], (31) 

and 

m v (y) = min{; £ A v q : S ; (y) < S f (y) Vz G Aft, (32) 

where y is a sequence over the alphabet An- Note that, for all 
v, M v (y) is a symbol from Aq appearing most frequently in y, 
while m v (y) is a symbols from Aq appearing least frequently 
in y. 

The algorithm is described as follows. 
SB Encoding Procedure 

1) Set uo = u and v — 1. 

2) Set m v = m v (u v -i), M v = M v (u v -x), and create u v 
from u c _i = (hi,h2, ■ ■ ■ ,hk) by leaving all hj Aq 
unchanged and adding the value 

f -q-\+2v-m v if i < i v , 

\ -q-l+2v-M v if i > i v , K ' 

to the hj G -4.^. The addition is done modulo 2q + 2 — 
2v such that the resulting symbol is in Aq. The value 
iv G {0, 1, . . . , fc} is chosen such that 

S- q -i+2v(u v ) = k/q = m. (34) 

3) If V < q — 1, then increase v by one and go back to the 
previous step. 

4) Set x = u,,_i, which is SB, and map 

(t'i, . . . , f 9 _i, mi, ... , m 9 _i, Mi, ... , M g _i) 

to a unique SB prefix p. 
Then transmit or store the SB codeword c = (p, x). 



SB Decoding Procedure 

1) Retrieve 

(t'x, . . . , iq-i, mi, ... , niq-i, Mi, ... , M, ? _i) 

from p and set Xq — x and v — q — 1. 

2) Create x D from x c +i = (/?i, hi,..., %) by leaving all 
/z, ^ ^ unchanged, and subtracting the value as given 
in (l33l from the h, G ^4^. The subtraction is done 
modulo 2q + 2 — 2v such that the resulting symbol is 
in A\. 

3) If v > 1, then decrease v by one and go back to the 
previous step. 

4) Set u = x x . 

Proof. By construction we have 

S-<j-l+2i>(Utt>) = S_ ? _i_|_2 l ,(Ut)) 

for all 1 ^ v < w ^ q — 1, and thus it follows from (l34l that 
all symbols from „4g appear equally often in x = Uo_i, and 
thus x is SB. Further, it is easy to see that the operations in 
the encoding procedure are properly reversed in the decoding 
procedure. Hence, the only thing left to show is that for all 
v = 1,2, . . . ,q — 1 there always exists at least one i v such that 
(l34l is satisfied. From (l3TT l and d32l i. it follows that S mi (uo) ^ 
m ^ Saij(uo), and thus 

S_ ?+ i(ui) = S Ml (u ) ^ m if z'i = 0, 

while 

S_ 9+ i(ui) = S Wl (u ) ^ m if z'i = k. 

Since increasing or decreasing z'i by 1 increases S_^ + i(ui) 
by —1, 0, or +1, we can conclude that S_^ + i(ui) = m for 
at least one z'i G {0, 1, . . . , k}. Similarly, for v > 1, we have 

_ , , k — (v — l)m , . 

Sm-^u^i) ^ — — = m < S Ml ,( u »-i)' 

q -p i 

and thus 

S-q-l+2v{u v ) = S Mv (u v -i) ^ m if i- = 0, 

while 

S_ 9 _i +2l ,(u l) ) = S mo (u _i) ^ m if z't, = /c, 

and so S_a_i_|_2i;( u p) = m f° r at l eas t one value i v G 
{0,1,. □ 
Note that there are at most (k+ l) 1 ? -1 possible realizations 
of (z'i, . . . , Zq-i), q\ possible realizations of (m.\, . . . ,m.q_\), 
and q\ possible realizations of (Mi, . . . , Mg_i). Hence, an 
unbalanced redundancy of 

p / = (z ? -l)log 9 (fc+l)+21og (? ( i? !) 

suffices. We conclude that, as for the Mascella-Tallini con- 
structions IfTUl , iFFD . the redundancy of this Knuth-like SB 
method exceeds the minimum redundancy by a factor of two 
for long codes. 

Example 5. Let q = 3 and n = 6, and thus the symbol 
frequency should be m = 6/3 = 2. The data sequence is 
given to be 

u = u = (0, -2, -2, -2, 0, -2). 
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Hence, S__ 2 (urj) = 4, So(urj) = 2, S + 2(uo) = 0, and thus 
Mi = —2 (the most frequent symbol) and m.\ = +2 (the 
least frequent symbol). According to (|33l l, in the first round 
(v = 1), the number of —2 symbols is forced to be 2 by 
modulo-6 adding —4 to the first i\ symbols of uo and to 
the last 6 — i\ symbols. Choosing i\ = 3 gives 

ui = (+2,0,0,-2,0,-2). 

Note that S_ 2 (ui) = 2, S (ui) = 3, S +2 (u\) = 1, and 
thus M 2 = and m 2 = +2. In the next round (v = 2), the 
number of zeroes is forced to be 2 by modulo-4 adding —2 
to the first i 2 symbols of ui and to the last 6 — i 2 symbols, 
except when the symbol is equal to —2, in which case we 
leave it unchanged. Choosing i 2 — 3 gives 

u 2 = (0, +2, +2, -2,0, -2). 

Note that So(u 2 ) = Si(u 2 ) = S 2 (u 2 ) = 2, and thus x = u 2 
is SB. 

F. Discussion 

In the previous subsections, we have presented generalizations 
of Knuth's binary/bipolar balancing algorithm to larger alpha- 
bets, for the various balancing perspectives under considera- 
tion in this paper. Examples have been provided to illustrate 
the (encoding) procedures. It should be mentioned that these 
examples are misleading in the sense that the redundancy 
appears to be relatively large, which is due to the fact that 
extremely short data blocks were used in the examples. For 
instance, in Example [2] four redundant symbols are used for 
eight data symbols. However, for long codes, the redundancy 
is only logarithmic in the length of the data block. For all 
the constructions presented in this section, the redundancy is 
roughly twice the corresponding minimum redundancy derived 
in Section ITTT1 

For the binary case, modifications of Knuth's method have 
been presented to close the factor of two gap between the 
redundancy of the original Knuth algorithm and the minimum 
redundancy, while maintaining sufficient simplicity to enable 
feasible implementations. In J8), this is done by a more 
efficient (variable-length) encoding of the prefix. In IF2TI . 
minimum redundancy is achieved by exploiting the fact that 
many data sequences have more than one possible balancing 
index, thus allowing to encode auxiliary data through the 
choice of the index. It is an interesting research challenge 
to investigate whether such techniques are also applicable in 
non-binary cases. 

V. Conclusions 

In this paper we have considered balancing of t/-ary sequences 
from various perspectives. In particular, we have reviewed 
the symbol balancing and charge balancing concepts, and 
introduced the polarity balancing concept, also in combination 
with charge balancing. For each of these perspectives, we 
have derived (approximate) expressions for the number of such 
sequences of a fixed length and for the minimum redundancy. 
The major conclusions of this analysis have been summarized 
in Table ILTl which shows the minimum redundancy normalized 



to the logarithm of the block length n in the limit as n ^ oo. 
Furthermore, we have presented for each of the balancing 
perspectives a q-axy coding scheme in the spirit of the binary 
Knuth algorithm. These schemes allow for simple encoding 
and decoding, at the price of a redundancy which is twice the 
minimum required redundancy. 
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